Add graph validation and statistics logging for debugging graph construction#519
Add graph validation and statistics logging for debugging graph construction#519princekumarlahon wants to merge 4 commits intomllam:mainfrom
Conversation
|
Happy to iterate on this if any changes are needed! |
kshirajahere
left a comment
There was a problem hiding this comment.
Hey, thanks for your work on this. I found a couple of things that are worth clearing up, and I’ve left inline comments with the details.
The PR also includes an unrelated CustomMLFlowLogger type-hint cleanup in custom_loggers.py.
neural_lam/create_graph.py
Outdated
|
|
||
| import torch | ||
|
|
||
| degrees = torch.bincount(edge_index[1], minlength=num_nodes) |
There was a problem hiding this comment.
These stats are using only in-degree (edge_index[1]), but the log messages say "degree" and "isolated nodes" as if they describe the whole graph. For directed graphs like g2m, source-only nodes get counted as isolated even when they have outgoing edges, so the debug output is misleading.
| if edge_index.min() < 0: | ||
| raise ValueError(f"[{name}] found negative node indices") | ||
|
|
||
| if edge_index.max() >= num_nodes: |
There was a problem hiding this comment.
This check is incompatible with the hierarchical m2m graphs below, because from_networkx_with_start_index() intentionally keeps globally offset node ids. For level 1+ edge_index.max() can be much larger than num_nodes even though the graph is valid, so this now breaks hierarchical graph generation.
There was a problem hiding this comment.
Thanks for the feedback I've updated the stats to use total degree (in + out), and added separate logging for in-degree and out-degree. Also added a small guard for empty graphs to avoid edge-case issues. Let me know if this looks good!
Describe your changes
This PR introduces lightweight graph validation and diagnostic utilities to improve debugging during graph construction.
It adds two helper functions:
validate_graphPerforms sanity checks on graph structure (shape, empty edges, invalid indices) and fails early with clear error messages.
compute_graph_statsLogs useful statistics about the graph, including number of nodes, edges, degree distribution, and isolated nodes.
These utilities are integrated into the graph creation pipeline for:
g2m)m2g)m2m, per level)Motivation and context
While working with graph construction, it can be difficult to quickly verify whether a generated graph is valid or understand its structure.
This change helps by:
Dependencies
No new dependencies are introduced.
Issue Link
closes #518
Type of change
Checklist before requesting a review
Checklist for reviewers
Author checklist after completed review
Checklist for assignee